Inventi Impact: Audio, Speech & Music Processing

Articles

Inventi:easm/14127/14

Empirically Combining Unnormalized NNLM and Back-off N-Gram for Fast N-Best Rescoring in\nSpeech Recognition

01-Jan-1970 Research 2014 : October - December

Yongzhe Shi, Wei-Qiang Zhang, Meng Cai, Jia Liu

Neural network language models (NNLM) have been proved to be quite powerful for sequence modeling, including\nfeed-forward NNLM (FNNLM), recurrent NNLM (RNNLM), etc. One main issue concerned for NNLM is the heavy\ncomputational burden of the output layer, where the output needs to be probabilistically normalized and the\nnormalizing factors require lots of computation. How to fast rescore the N-best list or lattice with NNLM attracts much\nattention for large-scale applications. In this paper, the statistic characteristics of normalizing factors are investigated\non the N-best list. Based on the statistic observations, we propose to approximate the normalizing factors for each\nhypothesis as a constant proportional to the number of words in the hypothesis. Then, the unnormalized NNLM is\ninvestigated and combined with back-off N-gram for fast rescoring, which can be computed very fast without the\nnormalization in the output layer, with the complexity reduced significantly. We apply our proposed method to a\nwell-tuned context-dependent deep neural network hidden Markov model (CD-DNN-HMM) speech recognition\nsystem on the English-Switchboard phone-call speech-to-text task, where both FNNLM and RNNLM are trained to\ndemonstrate our method. Experimental results show that unnormalized probability of NNLM is quite complementary\nto that of back-off N-gram, and combining the unnormalized NNLM and back-off N-gram can further reduce the word\nerror rate with little computational consideration.

How to Cite this Article
CC Compliant Citation: Shi et al.: Empirically combining unnormalized NNLM and back-off N-gram for fast N-best rescoring in speech recognition. EURASIP Journal on Audio, Speech, and Music Processing 2014 2014:19, doi:10.1186/1687-4722-2014-19. Â© 2014 Shi et al.; licensee Springer. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited.
Download Full Text

Call Us: +4 (800) 888-0008

Inventi Impact: Audio, Speech & Music Processing

Articles

Inventi:easm/14127/14

Empirically Combining Unnormalized NNLM and Back-off N-Gram for Fast N-Best Rescoring in\nSpeech Recognition

How to Cite this Article

Links

Contact Us